Fast forward a few years and Tim was working got a small startup that was in the midst of a classic code rewrite, from Perl to Java. The rewrite couldn’t happen all at once, but iteratively, which was a less common idea back then. They wanted to make sure that site URLs wouldn’t change which would typically happen when you switch from one server technology to another but decided that putting Apache and mod_rewrite in front of JSP should be enough. Despite developing iteratively, the switch-over happened in one late night session. Again, there were no dedicated fault monitoring services then, aside from a daily summary of problems, after they had already happened.
It seemed like everything was working, but the team was only hitting static pages, and in fact, the rewrite rules were creating infinite loops and the server ground to a halt. A roll back wasn’t possible due to schema changes and Tim had no idea what to do. The founders were screaming, but everything was so broken, and the team was stumped, up all night thinking of options, even reading through the mod_rewrite source code to see if there was an option there. Hours later, they figured out there was a regex in the rewrite rules without an asterisk, and the lack of wild card killed everything. There was no redundancy, no failover, and the company lost hundreds of thousands of dollars of business. Back then teams still conducted postmortems, but they were not blameless, everyone was tired, cranky, and annoyed with the world.
Lessons learned: The team developed a split deployment process, rollbacks and other best practices, so learned thanks to the experience, but there’s nothing worse than having no idea how to solve a problem.